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ABSTRACT 


In  view  of  the  trend  toward  the  repr e  aentation  of  signal*  aa 
physical  observables,  characterised  by  vectors  in  an  abstract  signal 
space,  rather  than  as  time  or  frequency  functions,  it  is  desirable  to 
define  dimensionality  in  a  manner  which  would  be  independent  of  the 
choice  of  basis  for  the  vectors. 

In  this  work,  the  dimensionality  of  a  collection  of  signals  is 
defined  as  equal  to  the  number  of  free  parameters  required  in  a  hypo¬ 
thetical  signal  generator  capable  of  producing  a  close  approximation 
to  each  signal  in  the  collection.  Thus  defined,  dimensionality  becomes 
a  relationship  between  the  vectors  representing  the  signals.  This  re¬ 
lationship  need  not  be  a  linear  one,  and  does  not  depend  on  the  basis 
onto  which  the  vectors  are  projected  in  signal  measuring  processes. 
It  represents  a  lower  bound  on  the  number  of  coefficients  required  to 
describe  the  signals,  no  matter  how  sophisticated  the  representation 
scheme,  and  thus  provides  an  index  of  the  redundancy  in  a  given 
representation. 

A  computer  program  for  estimating  this  dimensionality  from 
the  signal  coefficients  on  an  arbitrary  orthogonal  basis  is  developed. 
The  program,  suitable  for  an  IBM  7094  computer,  is  based  on  some 
results  from  a  related  multidimensional  scaling  problem,  and  utilises 
an  inverse  relationship  between  the  variance  in  interpoint  distance 
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within  a  hyper  sphere,  and  the  dimensionality  of  the  hyper  sphere. 
This  method  is  independent  of  the  choice  of  orthonormal  basis,  and 
no  prior  knowledge  of  the  analytical  form  of  the  signals  is  assumed. 

The  validity  of  the  program  is  verified  by  using  it  to  estimate 
the  dimensionality  of  signals  of  known  structure,  and  therefore  of 
known  dimensionality. 
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INTRODUCTION 


There  has  been  a  tendency  in  the  past  for  authors  of 

communications  papers  to  speak  of  a  signal  synonymously  with 

its  representation  as  a  time  or  frequency  function.  Thus  a 

particular  signal  may  be  designated  as  a  "sine  wave",  or  "square 

wave",  or  as  a  "band -limited"  signal.  In  more  recent  publications 
£ 

(1,  2)  f  it  has  been  demonstrated  that  such  an  approach  tends  to 
obscure  the  true  nature  of  signals.  In  particular,  it  assumes 
that  in  the  noise-free  case,  complete  knowledge  of  the  signal 
is  possible,  while  in  fact  the  only  access  one  has  to  a  physical 
signal  is  through  a  measuring  device,  or  filter,  having  finite 
capabilities  and  therefore  able  to  yield  only  an  approximation  to 
the  signal. 

To  better  appreciate  signals  as  physical  observables, 
only  partially  accessible,  it  has  proven  useful  to  consider  them 
as  vectors,  |  F),  on  an  abstract  infinite  dimensional  signal 
space,  V.  (Where  practical,  Dirac's  notation,  as  adapted  by 
Lai  (2),  will  be  followed  in  this  report. )  In  this  representation, 
the  signal  energy  is  characterized  by  the  square  of  the  vector 
norm.  (F  |  F),  and  the  structure  by  the  vector  "direction.  " 


*  Whole  numbers  in  parenthesis  refer  to  references  listed 
beginning  on  Page  67. 


Determining  a  given  relationship  between  signals,  e.  g.  correlation, 
summation,  etc. ,  thus  becomes  an  operation  with  the  vectors  them¬ 
selves,  and  is  independent  of  any  time  or  frequency  basis.  An 
attribute  of  signal  collections  which  will  be  discussed  in  detail 
in  this  document  is  dimensionality. 

The  definition  of  signal  dimensionality  is  at  best  a 
difficult  task,  but  several  ad  hoc  definitions  are  in  use,  the 
most  common  being  based  on  a  time -bandwidth  product.  Here, 
one  speaks  of  the  dimensionality  of  individual  signals,  not  of 
classes  or  collections.  If  the  signal,  which  is  specified  as 
a  time  function,  has  negligible  energy  in  the  frequency  components 
above  B  cycles  per  second,  it  will  have  2B  degrees  of  freedom 
per  second.  The  dimensionality  of  a  r  second  portion  of  this 
signal  is  then  defined  to  be  2Bt.  The  usefulness  of  such  a 
definition  lies  in  the  sampling  theorems  which  permit  recovery 
of  the  t  second  portion  of  the  time  function  from  2Br  uniformly 
spaced  (in  time)  amplitude  samples,  with  an  error  which  varies 
inversely  with  the  product  Br.  (3,4,  5) 

Another  definition  which  finds  frequent  application 
is  based  on  an  orthonormal  expansion  of  the  signal,  again 
usually  expressed  as  a  time  function.  The  signal  is  represented 
as  a  linear  sum  of  weighted  orthonormal  components,  each 
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weight  being  the  inner  product  of  the  signal  with  the  corresponding 
component  function.  The  number  of  such  components  required 
to  represent  the  signal  to  within  a  specified  energy  error  is  then 
defined  to  be  the  dimensionality.  The  previous  definition  is 
a  special  case  of  this  one,  where  the  orthonormal  components 
are  shifted  cardinal  functions. 

Despite  their  usefulness,  such  definitions  are  unsatisfying 
in  view  of  the  trend  toward  thinking  of  signals  as  physical 
observables  rather  than  as  functions.  It  would  seem  more 
satisfactory  to  define  dimensionality  in  terms  of  some 
relationship  involving  the  vectors  representing  a  given 
collection  of  signals.  Thus  defined,  dimensionality  would 
become  an  intrinsic  property  of  the  collection. 

In  order  to  obtain  meaning  from  a  signal,  some 
measurement  must  be  performed  upon  it.  Such  a  measurement 
usually  has  the  form  of  a  projection  of  the  signal  vector 
onto  prescribed  basis  vectors,  or  "patterns",  (  <J>^  |  ,  (4*2  I  * 

(<t>3  |  ,'**  <<j>n|  ,  which  characterize  the  measuring  apparatus. 
Since  the  only  access  an  observer  has  to  the  signal  is  through 
these  measurements,  a  technique  must  be  found  for  evaluating 
the  dimensionality  of  the  signal  collection  in  terms  of  these 
measurements. 
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.  Plan  of  the  Report 


The  purpose  of  this  work  is  twofold: 

1.  To  suggest  a  generalized  definition  of  signal- 
collection  dimensionality. 

2.  To  develop  a  technique  for  estimating  this 
intrinsic  dimensionality  for  collections  of 
signals. 

In  the  present  chapter,  the  problem  has  been  roughly 
described.  In  Chapter  II,  this  description  will  be  extended  to 
a  precise  mathematical  formulation,  and  the  generalized 
definition  of  dimensionality  will  be  given. 

The  technique  for  estimation  of  signal  class  dimensionality 
is  based  on  a  relationship  between  the  dimensionality  and  the 
geometry  of  the  vectors  representing  the  signals  in  signal 
space.  Chapter  III  is  in  two  parts.  The  first  part  sets 
forth  certain  assumptions  about  the  signal  collection  which 
must  be  valid  if  the  estimation  of  the  dimensionality  is  to  be 
feasible.  The  second  part  discusses  an  inverse  relationship 
between  the  variance  in  interpoint  distances  in  a  hypersphere  and 
the  dimensionality  of  the  hypersphere. 
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In  Chapter  IV ,  this  relationship  in  hyper  spheres  is 
utilized  in  developing  a  technique  for  estimating  the  dimensionality 
of  signal  collections.  Nothing  is  assumed  known  about  the 
signals  with  the  exception  of  their  spectral  coefficients  on 
some  arbitrary  orthonormal  basis.  A  requirement  of  the 
technique  is  that  the  final  result  remain  invariant  under  change 
of  this  basis. 

In  Chapter  V,  a  computer  program  outline  and 
simplified  flow  chart  are  given  which  realize  the  technique 
developed  in  Chapter  IV.  To  obtain  the  greatest  generality, 
the  program  must  be  an  iterative  one,  and  its  usefulness  is 
predicated  on  the  availability  of  a  large-scale  automatic  digital 
computer. 

In  Chapter  VI,  examples  of  the  application  of  the 
technique  to  several  collections  of  signals  are  given.  The 
computer  used  in  these  tests  was  an  IBM  7094. 

Chapter  VII  summarizes  and  discusses  the  program  and 
the  experimental  results. 
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FORMULATION  OF  THE  PROBLEM 

A  common  approach  in  the  representation  and  analysis  of  a 
collection  of  experimentally  obtained  physical  signals  is  to  find  the 
spectral  coefficients  of  the  signal  when  expanded  in  a  set  of  basis 
functions  <^|,  i  =  1,  2,  . .  .  In  order  to  minimise  the  effects  of 
slight  numerical  errors,  the  basis  functions  are  required  to  be  un¬ 
correlated,  for  example, 

<f4|  >  =  J  dt  **<t)  =  kj  64j  <1) 

for  all  i  and  j  in  the  case  where  the  basis  functions  4>j(t)  are  functions 
of  time.  For  convenience,  the  are  set  equal  to  unity  through 
suitable  scale  factors  associated  with  the  |<|>j  >.  | 

If  the  orthonormal  set  of  basis  functions  is  complete,  the 
time -function  representation  of  a  signal  may  be  written  as 

oo 

f(t)=  £  ]*i><?i|F>  (2) 

i  =  1 

oo 

f(o  =  y  »j  i  +i  >  o) 

»j  «  J  *  <|>*(t)  f(t)  =  <3j|f  > 


or 


where 
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Using  a  familiar  signal  representation  concept,  the  signal  may 
be  represented  as  a  vector  defined  on  a  hyper  space  having  the  $j(t)  as 
a  basis.  (1)  The  <  4>.|  are  considered  as  unit  vectors,  and  the  signal 

vector  If  )  has  coordinates  a.  on  this  basis. 

1 

The  choice  of  a  suitable  basis  is  somewhat  arbitrary.  Provided 
that  both  bases  are  complete,  that  is,  all  of  the  signals  be  wholly  within 
the  subspace  spanned  by  the  bases,  each  signal  could  just  as  properly 
be  represented  by  its  coordinates  in  a  second  basis  . 

oo  oo 

lF>  =  I  aJ+i  >  =  I  b,|+  >  (4) 

i=l  i=l 

The  vector  representing  the  signal  does  not  depend  on  the  basis,  rather, 
the  coordinates  of  the  vector  are  dependent  on  the  choice  of  the  basis. 

From  a  practical  standpoint,  there  is  much  to  be  gained  from 
selecting  a  set  of  basis  functions  such  that  a  minimum  of  vector  com¬ 
ponents  suffice  to  represent  the  signal  to  within  some  error  criterion. 

In  practice,  signals  to  be  investigated  are  often  empirical  rather  than 
being  given  as  analytical  functions  of  time,  frequency,  or  other 
variables.  The  spectral  coefficients  of  the  signal  are  obtained  by 
passing  the  signal  through  a  series  of  filters,  each  "matched"  to  one 
of  the  basis  patterns,  and  sampling  at  the  appropriate  instant  as  shown 
in  Figure  1.  These  sample  values  are  then  the  projections  of  the  signal 
in  question  on  each  of  the  basis  patterns.  To  avoid  the  expense  of  a 
large  number  of  filters,  the  basis  functions  are  chosen  such  that  a 
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small  number  of  basis  functions  spans  the  same  subspace  of  V  as  the 
signal  to  within  a  prescribed  energy  error. 

The  desirability  of  selecting  the  fewest  numbers  for  represent* 
ing  the  signals  is  more  basic  than  simple  considerations  of  parsimony, 
however.  Consider  the  model  for  a  generator  of  one-sided,  single - 
epoch  signals  shown  in  Figure  2. 

The  output  of  the  filter,  |f|,  is  the  signal  which  will  be  applied 
to  the  input  of  the  n-dimensional  orthonormal  filter  to  determine  the  a^. 
If  | F 1 0  >  is  completely  unspecified,  the  probability  that  the  point 
representing  this  signal  in  the  n  dimensional  space  will  be  within  a 
hypersphere  of  some  specified  radius  cannot  be  determined.  A 
collection  of  observed  signals  may  be  thought  of  as  arising  from  such 
a  filter  which  is  free  to  vary  randomly  between  consecutive  signal 
outputs.  The  points  in  the  n-dimensional  representation  space  will 
then  be  randomly  distributed  through  the  space. 

Such  a  model  is  clearly  inconsistent  with  real  world  signal 
sources,  which  are  not  free  to  vary  arbitrarily  but  are  subject  to 
definite  constraints.  If  the  signal  is  noisy,  these  constraints  are 
"soft",  the  constraints  becoming  more  well  defined  as  the  signal-to- 
noise  ratio  increases.  Further  discussion  of  the  effects  of  noise  on 
this  formulation  will  be  deferred  to  Chapter  III. 

These  constraints  will  be  reflected  in  the  distribution  of  signal 
coordinates  in  the  representation  space.  For  example,  a  maximum 


i  rv 


energy  restriction  on  the  output  signals. 


<  F  I F  >SR 


(5) 


would  require  that  the  points  in  the  representation  space  lie  within  a 

l/2 

hypersphere  of  radius  R 

A  more  realistic  signal  generator  model  which  includes  con¬ 
straints  on  the  variations  between  successive  output  signals  is  shown 
in  Figure  3.  The  constraints  are  introduced  as  a  finite  number  of 
filter  parameters  which  may  vary  at  random  between  signals. 

Now,  if  the  class  of  signals  so  generated  is  representable  to 

within  an  acceptable  error  on  a  space  spanned  by  n  patterns,  the 

th 

representation  of  the  j  signal  will  be  of  the  form: 

n 


lrjl°>  a  I  a.flsyl*.  >  -  |e.  > 


(6) 


i  =  1 


where 


*20) . 4^0)] 


and  |e.  )  is  the  error  in  representing  the  j**1  signal. 

J 

The  number  n  is  usually  referred  to  as  the  "dimensionality"  of 
the  signal  class.  For  the  purpose  of  this  discussion,  n  will  be  termed 
the  linear  dimensionality  of  the  class.  The  number  k  will  be  defined 
as  the  intrinsic  dimensionality  of  the  class.  The  linear  and  intrinsic 
dimensionalities  are  related  only  by  an  inequality,  k  s»  n.  The  value  of 
n  can  be  considerably  greater  than  that  of  k,  as  in  the  case  where 
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=  0  t  <  0 


For  arbitrary  values  of  \\i  expansion  of  this  class  of  signals  on  an 
orthogonal  basis  will  require  a  great  number  of  coefficients,  while 
k  =  2. 

The  difference  between  n  and  k  represents  a  redundancy  in 
representation  which  cannot  in  general  be  removed  by  a  linear  trans¬ 
formation  of  basis. 

The  problem  to  be  considered  may  now  be  stated  as  follows: 

Given  a  collection  of  signals  whose  spectral  coefficients  a^  on 
a  basis  |  4>.  )  are  known,  assume  that  a^  =  a^(^) 

where  ^  =  ko).  and  from  these  a.,  estimate  the 

value  of  k. 
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GEOMETRICAL  CONSIDERATIONS 


A.  Assumptions 

Although  the  approach  to  be  developed  here  is  intended  to 
achieve  the  greatest  possible  generality,  it  was  fqvp4  necessary  to 
place  three  restrictions  on  the  hypothetical  signal  generators 

1.  |F  |0  >  4  Jfq|0  >  for  all  p  J*  q  (8) 

As  previously  stated,  it  is  advantageous  from  an  instrumentation 
standpoint  to  use  an  orthonormal  basis  for  representing  the  signals 
|  F  )  and  |  F  ) .  Thus  the  above  re  striction  may  be  equivalently  stated 

r  M 

»4<V  * »,(«,)  for  all  p  *  q  (9) 

where 

* j  =  [  ^0).  ] 

Were  this  restriction  to  fe?  invalid,  even  for  just  two  values  of 
p  and  q,  say  p'  and  q'  so  that 

*i<*p.)  (li) 

then  the  apparent  dimensionality  of  the  collection  of  signals  would  be 
spuriously  high.  The  technique  to  be  developed  in  Chapter  IV  operates 
on  the  locus  of  the  signal  vectors,  and,  for  example,  loops  in  the  locus 
of  vectors  in  the  single  parameter  case  will  appear  to  have  been  produced 
by  *i  two  parameter  signal  generator. 
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An  equation  of  the  form 


is  not  possible  here  because  the  terms  (  ^ | n  )  in  the  r'a  are  not 
deterministic.  If  the  noise  is  white  and  has  zero  mean,  the  s  will 
determine  the  most  likely  coordinates  of  each  r ,  and  the  actual 
coordinates  will  have  a  symmetric  probability  distribution  surrounding 
this  point. 

In  Chapter  IV,  it  will  be  postulated  that  a  single -parameter 
signal  generator  will  produce  signal  vectors  whose  locus  in  the  signal 
space  V  will, be  a  curve.  The  intrinsic  dimensionality  estimation 

,  • 

technique  wi}l  be  based  on  this  hypothesis.  For  large  S/N  energy 

ratios,  the  noise  hopefully  will  do  no  violence  to  this  hypothesis. 

However,  as  the  noise  energy  increases,  one  would  expect  a  point  to 

be  reached  where  a  "threshold"  effect  occurred  and  the  reliability  of 

the  dimensionality  measurement  would  fall  off  rapidly.  In  the  case  of 

white  Gaussian  noise,  this  threshold  should  occur  when  the  variance  of 

the  noise  is  of  the  order  of  magnitude  of  the  least  radius  of  curvature 

of  the  locus  of  the  signal  vectors  r  ..  Since,  when  finite  bases  are 

J 

used,  this  locus  may  depend  on  the  basis  as  well  as  on  the  signals, 
the  effect  of  noise  on  the  measurement  cannot  be  predicted  here. 
However,  as  a  rough  indication,  the  S/N  ratio  at  which  white  Gaussian 
noise  will  have  variance  exceeding  the  least  radius  of  curvature  of  the 
locus  of  vectors  representing  square  pulses  expanded  on  an  orthonor- 
malized  exponential  basis  will  be  evaluated  in  Chapter  VII. 
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B.  Diatances  in  a  Hype  raphe  re 

At  this  point  it  ia  necessary  to  digress  briefly  and  to 
consider  the  distribution  of  interpoint  distances  in  an  N  dimensional 
homogeneous  hypersphere.  This  distribution  was  first  investigated 
by  Deltheil  (7)  and  more  recently  by  Hammersley  (8)  and  Lord  (9). 

Consider  a  sphere  of  radius  a  and  dimension  N.  Let  the 
distance  between  two  points  within  the  sphere  be  designated  by  r, 
and  for  convenience  let 

X  =  r/2a  so  that  Os  \  s  1,  (15) 

Deltheil  developed  an  expression  for  the  probability  density  function 
for  r,  and  evaluated  this  for  some  odd-integer  values  of  N.  He  did 
not  evaluate  the  cases  where  N  was  even. 

Using  an  involved  approach,  Hammersley  obtained  the 
following  compact  form  of  Deltheil* s  expression: 

fN  (X)=  2NNXN’1I  2<7N+^,^)  (16) 

1  *  X 

where  x 

Ix(p,q)=  J  ZP’V-Z)*’1  dZ/B(p,q) 
o 

and  B  is  the  Beta  function. 

This  form  is  good  for  all  values  of  N.  For  the  particular  cases  N=  1, 
N  -  2,  and  N  =  3,  i.  e. ,  a  line,  a  circle,  and  a  solid  sphere,  the  above 


O 

f* ,« 
\**W 
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expression  reduces  to  easily  handled  expressions: 

f^M  =  2(1  -  \) 

f2<X)  =  x[<c°8"lx>  -  Ml-X2)l/2]  (17) 

f3(\)  =  12  (2\2  -  3X3  +  X5)  for  0  S  X  s  1 

Lord  obtained  the  same  results  through  a  more  general 
approach,  and  showed  further  that  the  distribution  of  distances  is 
asymptotically  normal  as  N  increases,  and  furthermore  that  the 
second  moment  of  the  PDF  is  given  by 

X*  =  2a2  N(N  +  2)"1  (18) 

From  Equation  1 7,  the  mean  may  be  calculated,  and  this  is  shown  in 
Table  1,  along  with  X  ,  (with  a  =  i/Z  for  normalisation),  together 
with  the  variance  in  the  interpoint  distances  about  the  mean. 


TABLE  1  Distance  Distributions  in  Hyperspheres 


N 

X 

X2 

Variance 

1 

. 333333 

.166666 

. 055556 

2 

.452712 

.250000 

. 045056 

3 

. 514286 

.  300000 

.  035510 

CD 

. 707107 

.500000 

. 000000 
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The  probability  density  functions  for  N  =  1,  N  «  2,  and  N  =  3 
are  shown  in  Figure  4.  The  function  is  Asymptotically  normal  and  the 
variance  decreases  monotonically  toward  sero  as  N  increases,  the 
limiting  case  being  a  6  function  at  X  =  0.  707107.  This  result  will  be 
of  fundamental  importance  in  the  sequel. 


Oi 
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Figure  4  Distribution  of  Distance  in  a  Hypersphere 


IV. 


DISCUSSION  OF  APPROACH 


Consider  the  output  of  the  signal  generator  of  Figure  3  as 
projected  on  some  orthonormal  basis  complete  for  this  collection 

oo 

|F.  >  .  £  at  (Sj)  (19) 

J  i  =  1 

where 

F or  each  value  of  J,  a  point  in  a  signal  subspace  having  the  <J>j  as  a 

basis  will  be  generated,  having  coordinates  a^,  a^ . a  .  The  locus 

of  these  points  will  reflect  the  value  of  k,  since  this  locus  may  be 
described  in  terms  of  k  generalised  coordinates,  provided  that  the 
three  assumptions  in  Chapter  IH  are  valid  for  the  signal  generator  |F). 
The  problem  of  determining  the  value  of  k  is  thus  equivalent  to  determin> 
ing  whether  this  locus  is  a  curve,  (k  =  1),  a  surface,  (k  =  2),  or  an  m 
dimensional  solid  (k  =  m).  Peano's  continuous  mapping  of  an  interval 
onto  the  whole  of  a  square  shows  that  the  dimension  of  a  space  cannot 
be  defined  as  the  number  of  parameters  required  to  describe  the 
space.  (10)  That  a  single  parameter  signal  generator  will  produce  a 
curve  in  signal  space  cannot  therefore  be  rigorously  shown.  For 
continuously  varying  ijj's,  this  will  be  taken  as  an  assumption  under  the 
restrictions  imposed  in  the  preceding  chapter,  subject  to  experimental 
verification.  From  this  point  on,  the  problem  of  estimating  the  value 
of  k  from  the  spatial  distribution  of  the  points  representing  the  signals 


un  r  *  v  u 


will  be  taken  to  be  equivalent  to  finding  the  number  of  generalised 
coordinates  needed  to  describe  the  locus  of  these  points. 

A  related  problem  occurs  in  the  field  of  Experimental 
Psychology,  and  is  known  as  the  problem  of  Multidimensional  Scaling. 
Briefly,  this  problem  is  stated  as  follows: 

"Given  the  experimental  dissimilarities  of  n 
objects,  find  a  set  of  n  points  whose  interpoint 
distances  are  a  monotone  function  of  these 
dissimilarities"  (11,  12). 

A  computer  routine  for  multidimensional  scaling  which  is  based 
on  the  inverse  relationship  between  interpoint  distance  variance  and  the 
dimension  of  a  hypersphere  has  been  written  by  Roger  Shepard.  (12) 

This  program  begins  with  a  collection  of  n  points,  P^,  P2 . Pn, 

each  representing  some  psychological  quantity:  color  perception, 
interpretation  of  facial  expression,  etc.  The  only  knowledge  of  the 
relationship  between  the  Pj  is  in  the  form  of  a  similarity  ranking.  A 
configuration  is  sought  such  that  this  ranking  is  inversely  duplicated 
by  the  distances  between  the  Pj.  That  is,  if  the  quantities  represented 
by  P.  and  P^  are  known  to  be  quite  similar,  then  the  distance  between 
points  Pj  and  should  be  small.  If  the  similarity  between  P.  and  P^ 
is  denoted  by  sik-  and  the  distance  between  points  Pj  and  P^  is  denoted 
by  dj^,  then  for  all  n(n-l)/2  pairs  of  points,  the  ranking  of  distances 
and  dissimilarities  is  to  be  preserved. 
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(20) 
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It  would  be  expected  that  given  only  the  set  of  inequalities 
ranking  the  the  corresponding  configuration  of  points  having  the 
required  ranking  of  d^  would  be  far  from  unique.  Surprisingly,  it 
can  be  demonstrated  that  for  moderate  values  of  n,  (n  *  50),  and  for 
a  final  configuration  dimensionality  of  3,  the  resulting  configuration 
is  very  well  defined.  (13)  At  the  end  of  this  analysis  not  only  is  it 
possible  to  obtain  the  proper  distance  ranking,  but  actual  measurements 
of  these  distances  as  well.  That  measurements  can  be  obtained  from 
non-metric  ranking  data  is  not  too  startling  when  it  is  considered  that 
the  full  set  of  1225  inequalities  is  highly  redundant  if  the  configuration 
is  of  small  dimensionality. 

The  method  used  is  as  follows:  The  n  points  are  first  located 
at  the  n  vertices  of  a  regular  n-1  dimensional  simplex.  The  points 
are  then  perturbed  in  directions  which  make  the  ranking  of  the  (n  -n)/2 
interpoint  distances  conform  to  the  desired  ranking.  Next  the  points 
are  shifted  so  that  d^  larger  than  the  mean  are  increased,  and  d^ 
smaller  than  the  mean  are  decreased.  These  two  steps  are  iterated 
until  the  configuration  becomes  stationary.  As  the  iterations  progress, 
the  ranking  is  maintained,  and  the  dimensionality  of  the  configuration 
decreased  from  n  to  its  final  value. 

An  anomaly  in  some  of  Shepard's  results  can  be  exploited  in 
solving  the  intrinsic  dimensionality  problem.  If  the  configuration  of 
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points  at  some  iteration  in  the  multidimensional  scaling  program  is 
an  arc  of  less  than  180  degrees,  this  will  be  stretched  out  into  a  line. 
Similarly,  a  hemispherical  shell  will  be  deformed  into  a  plane  surface. 

4 

In  the  case  of  multidimensional  scaling,  this  yields  a  spuriously  low 
dimensionality  for  the  final  configuration.  In  the  problem  considered 
here,  however,  this  is  just  what  is  needed.  Provided  that  the  ranking 
is  not  preserved  over  too  much  of  the  configuration  at  a  time,  curves 
in  n- dimensional  space  collapse  into  lines,  surfaces  into  planes,  and 
so  forth.  The  intrinsic  dimensionality  of  the  signal  source  thus  is 
reflected  as  the  spatial  dimensionality  of  the  final  configuration.  The 
procedure  consists  of  iterating  two  processes  which  together  collapse 
the  configuration  of  points  representing  the  signals  |Fj)  on  the  $  basis. 

The  first  process  increases  the  variance  in  interpoint  distances. 
Consider  a  configuration  of  points  in  a  plane  as  in  Figure  5. 

The  projections  of  the  vector  from  0  to  point  onto  the  X  basis 
are  given  by  a^  and  where  the  first  subscript  denotes  the  point  and 
the  second  subscript  denotes  the  basis  vector. 

For  an  orthonormal  basis,  the  distance  between  points  i  and  j 


is  given  by 


hi =  t(aii  ‘ 


"  aj2* 


or,  in  n  dimensions 

n 

dij  =  4  (aik  '  ajk) 


(22) 
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Figure  5  Two-Dimensional  Configuration 


Let  the  unit  vector  in  the  Xj  direction  be  Uj.  Then  the  vector 


from  P.  to  P.  is 
i  J 


*ij  =  (ajl  '  aii>  u«  +  <aj2  -  ai2>  u2 


(23) 


or,  in  n  dimensions 


n 


^ij  =  I  (ajk  "  aik)  “k  ‘ 


k  =  1 


(24) 


Let  the  arithmetic  mean  of  the  d^  be  7 . 

To  increase  the  variance  in  the  d^,  those  distances  d^  <  7  should  be 
reduced,  and  d^  >7  should  be  increased.  This  may  be  done  incre¬ 
mentally  as  follows. 

Consider 


a(<Lj  -  H ) 


L  =  the  expansion  factor 


(25) 


by  which  ooints  P.  and  Pj  should  be  moved  apart,  ignoring  for  the 
moment  all  other  pairs  of  points  which  include  either  Pj  or  Pj.  It  is 
desired  to  increase  the  magnitude  of  by  the  factor  .  Thus  the 
point  P.  should  be  moved  a  distance  A.j  d^/2  in  the  direction  - 
and  Pj  should  be  moved  a  distance  Ajjd.j/2  in  the  direction  +  /Ty 


The  final  coordinates  for  the  point  P.  are  thus  given  by 

J 


A .. 


•  a..  + 
Jk 


(ajk '  aik} 


(26) 


Afhen  the  entire  collection  of  points  is  considered,  the  shifting 


of  the  point  is  governed  by  the  vector  sum  of  the  Ay  weighted  by  the 
corresponding  Ay/2,  as  shown  in  Figure  6.  In  this  case  the  final 
position  of  P.  is 


m 


ajk  +  1/2  .  I  *ajk  '  aik*  L 


i  =1 


ij 


(27) 


where  m  is  the  total  number  of  points. 

The  second  process  restores  the  ranking  of  interpoint  distances 
within  a  small  upherical  region  local  to  each  point,  of  radius  $3.  To 
initially  determine  this  set  of  inequalities  consider  the  point  Pj.  The 
distances  to  each  other  point,  dy  for  i  ^  j,  are  known  from  the  calcu¬ 
lations  for  the  first  process.  Those  points  for  which  dy  >  0<J  are 
ignored,  and  for  the  remaining  points,  the  interpoint  distances  are 
ranked.  This  is  repeated  for  each  point  in  the  collection.  In  all  m 
chains  of  inequalities  are  obtained,  with  lengths  depending  on  the  choice 
of  the  coefficient  (3. 

2  (id  is  the  diameter  of  a  hyper  sphere  of  dimension  n  centered 
on  the  point  P^.  Consider  the  simple  case  where  this  hypersphere 
includes  3  points,  which  will  be  designated  Pj,  P^,  and  Pj.  There  are 
three  interpoint  distances,  dy,  d.^,  and  d^..  Suppose  further  that 
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INITIAL  CONFIGURATION 


d23<d2.<d.5<d34<d24<d,4 


P  P 
«  KS 


FINAL  CONFIGURATION 


Figure  6  Collapsing  of  a  Curve  into  a  Line 


\ 
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jo* 


d.j  <  d.^  <  (78)  Since  the  goal  of  this  process  is  to  preserve  this 

inequality,  but  not  the  actual  values  of  the  d^'s,  etc. ,  the  d's  are  now 
replaced  by  the  numerical  values  of  their  rank  in  the  inequality,  R^^, 
where  a  designates  the  point  at  the  hyper  sphere  center,  and  b,  c 
designate  the  two  points  whose  interpoint  distance  is  being  ranked. 

For  example,  in  the  case  above,  a  =  i  and 


(29) 


For  a  hypersphere  containing  N  points,  the  values  of  R^^  will  run 
2 

from  1  to  (N  -  N)/2  since  d^  =  d^  and  the  djj  are  aero  and  not  ranked. 
The  value  of  a  runs  from  1  to  m,  the  number  of  points  in  the  collection. 

Once  the  initial  values  for  the  R*s  have  been  determined,  the 
hyper  sphere  s  will  no  longer  be  required  and  all  further  calculations 
will  use  the  same  values  of  b  and  c  for  each  a. 


After  each  iteration  of  the  variance -increasing  process,  the 
ranking  procedure  is  repeated,  i.e. ,  for  the  example  where 
Rjkj  =  1.  Rjjk  =  2,  and  Rj.^  =  3,  the  distances  between  points  are 
computed  and  ranked  after  the  variance -increasing  shifts  have  taken 
place,  and  these  ranks  are  designated  R'^j*  R^’  and  R'^j*  If  within 
the  i**1  chain  of  inequalities  no  violations  of  the  original  ranking  have 
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occurred. 


Rikj 

■  Ri*i  ■  1 

Riik 

=  Riik  =  2 

(30) 

Riij 

=  RUJ  ■  3 

If,  on  the  other  hand,  the  inequalities  no  longer  hold,  then  for  at  least 
two  pairs  of  points  R  ^  R*.  The  more  violent  the  scrambling  of  the 
iner  ’alities,  the  greater  will  be  the  difference  R  -  R'.  This  difference 
will  be  symbolized  by 

V  -<V '  Rijk,v  <31> 


where  y  is  a  constant. 

When  =  0,  the  ranking  of  the  interpoint  distance  is  correct. 

For  positive  integer  values  of  D^,  the  distance  between  points  Pj  and 
is  too  great,  and  for  negative  values  it  is  too  small.  The  procedure 
for  restoring  proper  ranking  is  very  similar  to  that  used  to  increase 
variance,  except  that  -  D.  will  be  used  ix.  place  of  The  minus 
sign  arises  as  a  simple  consequence  of  the  definition  of  D^.  Also, 
an  additional  summation  will  be  needed  to  accommodate  the  extra 
subscript. 

Consider  a  pair  of  points,  Pj  and  P^  within  a  hypersphere  of 
radius  pH  centered  on  point  Pj  before  the  first  iteration.  To  tend  to 


restore  the  distance  between  Pj  and  P^  to  its  proper  rank  in  the  i 


th 
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inequality  chain,  the  coordinates  for  the  point  Pj  are  shifted  from  ajj 
to 

(32) 


jl 


D... 


In  the  event  that  the  interpoint  distance  dy  is  out  of  place  in 
several  of  the  inequality  chains,  the  i  subscript  will  be  summed  over 
the  collection  of  points.  This  yields  as  the  final  set  of  coordinates  for 
point  P^, 

m  D. 

'jl 


ajl"*  ajl 


I  .  (ail  '  akl) 


(33) 


i  =  1 


This  yields  the  final  position  of  provided  that  only  its  distance 
from  P^  is  considered.  In  the  desired  application,  the  distances  from 
other  points  must  be  considered  as  well,  and  the  shifting  of  Pj  will 
depend  on  the  vector  sum  of  all  possible  D's.  This  vector  sum  is 
taken  in  the  same  way  as  in  the  variance  increasing  process,  and  the 
required  final  position  for  Pj  is  given  by: 


‘jl 


mm  n 

"  BJ1  kf  j  (Bjl'aW> 


(34) 


For  each  value  of  i,  there  will  be  many  values  of  k  and  j  for 
which  is  not  defined  because  either  point  P^  or  P^,  or  both,  lie 
outside  the  initial  hypersphere  of  radius  pH.  To  facilitate  computation, 
these  will  be  defined  to  be  zero. 
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Iterating  these  two  processes  should  eventually  lead  to  a 
configuration  which  no  longer  changes,  the  two  processes  cancelling 
out.  Further  iterations  may  rotate  the  configuration,  but  the  "shape" 
should  remain  fixed,  as  should  the  variance  in  interpoint  distances. 

Once  the  final  configuration  has  been  obtained,  it  remains  to 
identify  the  linear  dimensionality  of  this  configuration.  A  method  for 
doing  this  is  based  on  two  theorems  from  matrix  theory.  (14) 

Theorem  1  The  rank  of  a  normal  matrix  is  equal  to  the 
number  of  non-sero  eigenvalues  possessed  by  the  matrix. 

Theorem  2  The  rank  of  any  Gramian  matrix  of  vectors  is 
equal  to  the  linear  dimensionality  of  the  space  spanned  by  the 
vectors.  Combining  these  theorems  yields: 

Corollary  The  linear  dimensionality  of  the  space  spanned 
by  a  set  of  vectors  is  equal  to  the  number  of  non-sero  eigenvalues 
possessed  by  the  Gramian  matrix. 

In  the  case  of  a  matrix  whose  elements  are  inner  products, 
the  requirement  that  the  matrix  be  normal  and  positive  semidefinite 
is  automatically  filled. 

The  remaining  part  of  the  procedure  to  be  followed  is  as 
follows: 

Starting  from  any  point  j,  calculate  the  matrix  of  inner  products 
of  vectors  from  that  point  to  every  point  in  the  configuration 

<rjklrji>  =  b’ki 
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The  number  of  non -zero  eigenvalue*  of  this  matrix  will  then  be 
equal  to  the  linear  dimensionality  of  the  configuration,  and  therefore 
will  be  equal  to  the  intrinsic  dimensionality  of  the  original  collection 
of  signals.  Taking  into  account  that  Jr^  )  =  0,  and  that 

<  rjk  ^  rjl >  =  <rjllrjk> 
the  matrix  B'  is 


-  B' 


(35) 

(36) 

For  m  signals  projected  on  an  n  dimensional  basis,  at  least 
(m-n)  of  the  X's  so  obtained  must  be  zero.  This  will  be  verified  in 
Chapter  VI.  Of  the  remaining  n  eigenvalues,  k  will  be  non-zero. 
The  matrix  B'  will  of  course  be  different  for  different  choices  of  the 
j**1  point,  but  the  final  number  of  non- zero  eigenvalues,  k,  will  be 
constant.  Several  standard  techniques  for  finding  eigenvalues  of 


<rjllrjl  *  ^  rj2  f  rjl *  *0,  ,,<rjnJrjl> 

<rj2lrjl  >  <rj2  ^  rj2  >  *  •°'  ,<rjmlrj2> 


.  .  .  0. . . 


°-<vlv> 


The  equation  to  be  solved  is 


B«  -  IX  =  0 


symmetric  matrices  are  available.  The  technique  used  in  this  work  is 
a  variant  of  Jacobi's  method.  During  the  variance  maximising  process, 
the  centroid  of  the  configuration,  C,  is  not  constrained.  This  is  of  no 
consequence  as  far  as  the  final  dimensionality  is  concerned,  but  it  is 
convenient  that  the  centroid  of  the  configuration  be  used  as  the  origin 
of  the  vectors  rather  than  arbitrary  point  j.  A  formula  which  can  be 
used  to  calculate  the  inner  product  matrix,  using  the  centroid  as  the 


vector  origin  is  given  in  Equation  37.  (15) 


m- 1  m 


bjk=Sh  [  X  djk  +  X  dJk’  I  X  X  djk  ‘  *>  dj2k  ] 

j  =  1  k  =  1  j  =  1  k=j+l  J 


(37) 


where:  b^  =  element  of  matrix  B  of  scalar  products 
referred  to  centroid 

bjk=  <Vrck> 

djk  =  distance  between  and  k^1  point 
m  =  total  number  of  points  in  the  configuration. 


It  is  now  possible  to  set  forth  an  outline  for  a  computer  program 
which  will  accept  the  coefficients  of  a  collection  of  signals  expanded  on 
any  basis,  and  from  these  coefficients  estimate  the  intrinsic  dimension- 
ality  of  the  collection. 
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OUTLINE  OF  COMPUTER  ROUTINE 


Step  Number 


Step  Description 


Set  value*  for  a,  0,  y,  and  e.  Note:  a 
determine*  the  rate  at  which  the  interpoint 
distance  variance  increases;  0  defines  the 
region  about  each  point  over  which  interpoint 
distance  ranking  is  to  be  preserved;  y  deter¬ 
mines  the  vigor  with  which  the  program 
resists  inequality  violations;  and  e  sets  the 
stopping  criterion. 

Read  in  coefficients  of  signals  expanded  on 
some  orthonormal  basis.  Coordinates  of  the 
signal  on  the  kth  basis  vector  ■  a^  where 
j  =  If  2,  . .  .m;  k  =  1,  2,  . .  .n. 


Calculate  interpoint  distances 


U 

dij  =  [  4  (aik  "  ajk)Z  ] 


Calculate  the  arithmetic  mean  of  d^  £  cf 


••• 

H  =  £  £  dy/(m2  -  m)  (4°) 

i=  1  j  ^1 
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•  TRy  res 


Step  Number 


Step  Deecription 
Normalize  data  to  make  7=1. 


5 

6 

7 

8 

9 

10 

11 


12 


Compute  variance  in  normalized  d^  =  VAR 
VARO  =  variance  from  previoue  iteration. 

|  VARO  -  VAR  |  <  c  ? 

Yes,  go  to  step  20.  No,  go  to  step  8. 

First  iteration  ? 

Yes,  go  to  step  9,  No,  go  to  step  11. 


For  each  point  i,  store  numbers  of  point  j, 
k  such  that  both  d^  <  (37  and  d^  <  (37. 
dji  3  0,  7  =  1  by  5.  (See  Figure  7) 


Rank  all  possible  pairs  of  points  obtained 
in  step  9  in  decreasing  order  of  d^,  and 
calculate  R 


Calculate  A 


ijk' 

ij 


Aij  =  “fay"  <*)/<!=  °(dij  -  1). 


(25) 


Compute  new  coordinates, 
m 


^  aJk 


•ik*  Aij 


(27) 


13 


Compute  new  values 


35 


36 


Step  Number 

14 

15 

16 

17 

18 

19 

20 


21 

22 


Step  Deecription 
Compute  new  value  of  3 . 
Normalise  a^'s  to  eet  new  cT  =  1. 
Compute 

Calculate  Dyk  =  v(Ryk  -  RJJk> 
Compute  new  coordinates. 


Return  to  step  3. 


Calculate  B  matrix 


bjkSH  t  Z4  d2jk+  kX  i  djk 


(31) 


(34) 


m-1  m 

'  \  1  1  d  jk  '  nd  jk  ] 

j=l  k  =  j+l 


(37) 


Find  eigenvalues  of  Bt  using  the  method  of 
Jacobi.  This  is  a  standard  subroutine. 

Write  out  final  coordinates,  O  matrix, 

B  matrix. 
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Step  Number 


Step  Description  nij'i 

i  .  }• 

23  Write  out  eigenvalues  of  B. 

24  End. 


A  simplified  flow  chart  for  this  program  appears  in  Figure  8. 
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READ  IN  M,  N 


CALCULATE 
RPRIME  ( I,  J,K  ) 
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VI. 


EXPERIMENTAL  VERIFICATION 


The  program  previously  outlined  was  compiled  from  Fortran 
statements  on  an  IBM  7094  digital  computer.  This  chapter  describes 
the  results  obtained  for  several  examples  of  one  and  two-dimensional 
signal  collections.  In  all  cases,  the  basis  used  consists  of  one-sided, 
real,  decaying  exponentials,  orthogonalized  on  the  interval  0  -  oo.  The 
exponents  in  the  basis  are  -t,  -2t,  -3t,  -4t,  and  -5t.  These  exponen¬ 
tials  may  be  orthogonalized  by  either  Schmidt's  method  or  the  method 
of  Kautz.  (16)  In  the  time  domain,  the  re  stilting  orthonormal  basis 
is  described  by  Equation  41. 


=  (2)1/Z  (e_t)  u(t)  (41) 

<J>2  =  J4)1/2  (-2e-t  +  3e'Zt)  u(t) 

<t>3  =  (6)1/2  (3e-t  -  12e'2t  +  10  e"3t)  u(t) 

4*4  =  (8)1/2  (-4e-t  +  30e'2t  -  60e“3t  +  35e_4t)  u(t) 

4>5  =  (10)1//2  (Se”*  -  60e"2t  +  210e"3t  -  280e-4t 

+  126e"5t)  u(t) 


or  equivalently, 


i  + 


(2) 


1/2 


9  (s) 
n' 


=  [  ®n-l(i)  (TF?)][  5?t] 


1/2 


for  n  ^  1 


(42) 
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This  basis  will  be  referred  to  as  the  K&utz  basis  in  the  remainder  of  the 
paper.  The  Kautz  basis  was  chosen  for  the  tests  because  of  the  ease 
with  which  the  coefficients  ox  signal  expansions  may  ba  computed. 


The  signals  used  are  of  two  types: 

1.  Rectangular  pulses 

2.  Decaying  real  one-sided  exponentials 


1. 


A 


*-t-* 


All  signals  are  of  the  single  epoch  type.  Inner  products  of  the  Kautz 
basis  with  the  two  basic  signals  used  in  testing  this  program  are 
tabulated  below. 


1.  s(t)  =  u(t),  the  unit  step  beginning  at  t  =  0 


al  = 

|s)  = 

1.414 

a2 

<+2l 

|s>  = 

-1.000 

a3  = 

<+,l 

|s>  = 

.815 

a4  = 

<+4l 

|  s>  = 

-  .707 

a5  = 

<♦*1 

s>  = 

.632 

(43) 
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2.  s(t)  =  u(t-r),  the  unit  step  beginning  at  t  =  t 

l/2  -r 

® j  —  (2)  e 

a2  =  -(4)1/2  (2e'T  -  3/2  e'2T) 

a,  =  <6)1/2  (3e~T  -  6e‘2T  +  10/3  .'jT) 

a 4  =  -(8)1/2  (4e‘T  -  15e'2T  +  20  e"3T  -  35/4  e‘4T) 

a5  =  (10)1/2  (5e”T  -  30e"2r  +  70e-3T  -  70e-*r  +  .‘5t) 


In  the  initial  runs,  a  cutoff  limit  of  20  iterations  was  written 
into  the  program  to  insure  against  excessive  wasted  time  in  the  event 
that  the  configuration  did  not  converge. 
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As  was  stated  in  Chapter  IV,  the  matrix  equation 


B'  -  IX  =  0  (36) 

has  m  eigenvalues,  of  which  at  least  (m-n)  should  be  zero.  As  verifi¬ 
cation  that  this  part  of  the  program  was  correct,  a  collection  of  20 
random  decaying  exponentials  was  expanded  on  the  Kautz  basis,  and  the 
B  matrix  was  calculated  directly  from  this  input  data,  without  the 
collapsing  operation  being  performed.  The  total  number  of  eigenvalues 
so  obtained  should  have  been  20,  of  which  15  should  have  been  zero. 

The  eigenvalues  calculated  are  shown  in  Table  2. 


TABLE  2 

Eigenvalues 

1. 

7.  33867 

11. 

0.00000 

2. 

1.93994 

12. 

0.00000 

3. 

1.49168 

13. 

0.00000 

4. 

0.  32322 

14. 

0.00000 

5. 

0. 14997 

15. 

0. 00000 

6. 

0.00000 

16. 

0. 00000 

7. 

0.  00000 

17. 

0.00000 

8. 

0.00000 

18. 

0. 00000 

9. 

0.00000 

19. 

0.00000  ; 

10. 

0.00000 

20. 

0.00000 
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This  proved  to  be  the  case  in  every  run,  that  is,  eigenvalues  6  to  20 
were  always  zero.  Therefore,  in  the  examples  to  follow,  only  the 
first  five  eigenvalues  are  shown. 

hi  Chapter  IV,  the  postulate  was  put  forth  that  a  single 
parameter  class  of  signals  should  generate  a  locus  of  points  in  signal 
space  which  describes  a  curve.  To  gain  some  insight  into  the  shape 
such  a  curve  might  have,  a  set  of  20  rectangular  pulses  was  expanded 
on  the  Kautz  basis.  The  pulse  width  was  varied  from  0.  1  to  2.  0 
seconds,  and  the  height  was  held  constant  at  unity.  It  is,  of  course, 
not  possible  to  depict  the  resulting  curve  on  a  single  plot,  but  three 
projections  of  the  curve,  on  the  4>2“<^4’  aiu*  the  planes 

are  shown  in  Figures  9,  10,  and  11.  These  projections  show  the 
great  nonlinearity  of  the  curve,  and  also  indicate  that  about  a  10  to  1 
signal  to  noise  energy  ratio  might  be  expected  to  cause  the  program 
to  collapse  the  configuration  to  a  line. 

In  the  examples  to  follow,  signals  of  one  or  two  parameters 
were  expanded  on  the  Kautz  basis,  and  their  intrinsic  dimensionalities 
were  determined  by  the  program.  The  same  set  of  control  parameters, 
a,  0,  y,  and  e  were  used  in  all  cases.  Examples  1  and  3  were 
single  parameter  classes  of  signals,  while  Examples  2  and  4  had  two 
independently  varying  parameters.  Example  5  shows  the  effect  of 
dependence  in  the  variation  of  two  parameters.  It  is  essentially  the 
same  clast  of  signals  as  in  Example  2,  but  the  program  identifies  it  as 
a  single -parameter  class  due  to  the  dependent  parameter  variation. 


C5 

00 
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Figure  9  a*  v» 
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Figure  10  a_  ve. 


s 

H 


a* 

■  _ 
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Figure  11  a,  v». 


A.  Example  1 


Signal  used  -  Rectangular  pulaea  of  unit  amplitude  with  width  r 
varied  from  0. 1  to  2.  0  aeconda  in  equal  etepa.  (Thia  ia  the  aignal 
collection  depicted  in  Figurea  9,  10,  and  11)  20  aignala. 

Parameter  valuee:  a  =  0.05  y  =  0. 04 

0  =  0.75  e  =  0.005 

Number  of  iterationa  required  =  8 

Running  time  =  3  hundredths  hour 

Eigenvalue  a  obtained:  35.914352 

. 000008 
. 000000 
. 000000 
. 000000 
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EXAMPLE  1 


Coefficients  on  Kautz  Basis 
Signal 


Number 

al 

a2 

a3 

a4 

a5 

1 

.1346 

.1632 

.1514 

.1116 

.0578 

2 

.2564 

.2640 

.1707 

.04165 

-.0582 

3 

.  3665 

.3168 

.1188 

-.0649 

-.1230 

4 

.4662 

.  3333 

.0352 

-.  1453 

-.1044 

5 

.  5564 

.  3225 

-.0557 

-.1800 

-. 3306 

6 

.6381 

.2917 

-.1395 

-.1710 

+.0521 

7 

.7119 

.2466 

-.2083 

-.1289 

.  1226 

8 

.7788 

.1916 

-.2588 

- . 0663 

.1638 

9 

.8392 

.1304 

-.2905 

.0052 

.1722 

10 

.8940 

.0655 

-.3043 

.0763 

.1514 

11 

.9435 

-.0009 

-.  3023 

.1405 

.1086 

12 

.9883 

-.0674 

-.2866 

.  1937 

.0523 

13 

1.029 

-.  1327 

-.2599 

.2336 

-.0098 

14 

1.065 

-. 1960 

-.2243 

.2597 

-.0712 

15 

1.099 

-.2568 

-.1822 

.2721 

-.1268 

16 

1.129 

-.  3147 

-.1352 

.2721 

-.1734 

17 

1. 156 

-. 3694 

-.0852 

.2611 

-.2092 

18 

1.180 

-.4208 

-.0335 

.2408 

-.2334 

19 

1.203 

-.4688 

+.0189 

.2128 

-.2460 

20 

1.223 

-.5136 

+.0709 

.  1789 

-.2479 
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B.  Example  2 


Signals  used  -  Rectangular  pulses  with  both  amplitude.  A,  and 
width,  r,  varied  as  shown  below: 


Signal 

Number 

T 

_A 

Signal 

Number 

T 

A_ 

1 

0.1 

0.6 

11 

0.1 

1.0 

2 

0.4 

0.6 

12 

0.4 

1.0 

3 

0.  7 

0.6 

13 

0.7 

1.0 

4 

1.0 

0.6 

14 

1.0 

1.0 

5 

1.  3 

0.6 

15 

1.3 

1.0 

6 

0.1 

0.8 

16 

0.1 

1.2 

7 

0.4 

0.* 

17 

0.4 

1.2 

8 

0.7 

0.8 

18 

0.7 

1.2 

9 

1.0 

0.8 

19 

1.0 

1.2 

10 

1.3 

0.8 

20 

1.3 

1.2 

Parameter  Values:  a  =  0.05 

p  =  0.75 

Number  of  iterations  required  =  5 

Running  time  =  3  hundredths  hour 

Eigenvalues  obtained:  27.  56159 

1.19406 
. 00001 
.00000 


v  =  o.  04 
C  =  0. 005 


. 00000 
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EXAMPLE  2 

Coefficients  on  Kauts  Basis 


Signal 

Number 

al 

a2 

a3 

a4 

S 

1 

.0808 

.0981 

.0909 

.0670 

.0347 

2 

.2780 

.1995 

.0212 

-. 0872 

-.  0627 

3 

.4275 

.1484 

-.1250 

-.0774 

.  0726 

4 

.  5360 

.0393 

-.  1827 

.0458 

.0909 

5 

.  6170 

-.0795 

-.1559 

.1401 

-.0059 

6 

.  1078 

.1306 

.  1211 

.0893 

.0462 

7 

.  3730 

.  2666 

.0282 

-.1162 

-.0835 

8 

.  5695 

.  1975 

-.1666 

-.1031 

.0981 

9 

.  7152 

.0524 

-.2434 

.0610 

.1211 

10 

.8232 

-.1062 

-.2079 

.  1869 

-.0078 

11 

.1346 

.1632 

.1514 

.1116 

.0578 

12 

.4662 

.  3333 

.  0352 

-.1453 

-.1044 

13 

.7119 

.2466 

-.2083 

-.1289 

.  1226 

14 

.8940 

.0655 

-.3043 

.0763 

.  1514 

15 

1.029 

-.1327 

-.2599 

.2336 

-.0098 

16 

.1616 

.1961 

.  1818 

.1340 

.  0694 

17 

.5560 

.  3990 

.  0423 

-.1744 

-. 1253 

18 

.8550 

.2968 

-.2500 

-.1547 

.  1471 

19 

1.072 

.0786 

-.3655 

.0916 

.  1818 

20 

1.234 

-.1591 

-.3119 

.2802 

-.0119 
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C .  Example  3 


Signals  used  -  Real  decaying  one-sided  exponentials  of  unit 
amplitude  e-<*t 

^  varied  from  0. 1  to  2.0  seconds  in  equal  steps.  20  signals. 

Parameter  Values:  a  =  0.  05  y  =  04 

p  =  0.  75  c  =  0.005 

Number  of  iterations  required  =  8 

cc-  Running  time  =  4  hundredths  hour 

«H  Eigenvalues  obtained:  38.517118 

.000010 
. 000000 
. 000000 
. 000000 
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EXAMPLE  3 

Coefficients  on  Haute  Basis 


Signal 

Number 

al 

a2 

a3 

a4 

a5 

• 

1 

. 12856 

.1364 

.  1030 

.0594 

.032 

2 

.2357 

.1905 

.0874 

.022 

.003 

3 

. 32636 

.2019 

.0522 

.  001 

0. 

4 

.40405 

.1905 

.0210 

-.001 

0. 

5 

.4714 

.1667 

0. 

.0003 

0. 

6 

. 53033 

.1363 

-0.  012 

.0020 

0. 

7 

. 58232 

.  1029 

-0.016 

.0045 

0. 

*«-3 

8 

.62853 

. 06840 

-0.015 

.0065 

0. 

K. 

9 

. 66990 

.03380 

-0.0091 

.0040 

0. 

4- 

GJ 

10 

. 70711 

0. 

0. 

0. 

0. 

CO 

11 

. 74078 

-.0327 

0.011 

-0.0051 

0. 

12 

. 77139 

-.0642 

0.024 

-0.0120 

0. 

13 

. 79934 

-.0942 

0. 0377 

-0.0210 

0. 

14 

.82495 

-.1228 

0. 0519 

-.0291 

0. 

15 

.84853 

-.1500 

0.0669 

-.0385 

.  03 

16 

.87028 

-.1758 

0. 0816 

-.0478 

.03 

17 

.89043 

-.2004 

0.0968 

-.0591 

.06 

18 

.90914 

-.2236 

0. 1110 

-.0696 

.06 

19 

.92655 

-.2457 

0. 1260 

-.0795 

.06 

■ 

20 

.94281 

-.2667 

0. 1400 

-.0897 

.06 

. 

l 
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D.  Example  4 


Signals  used  -  Real  decaying  one-sided  exponentials  with 
decrement  and  amplitude  varied  as  shown  below. 


Signal  Signal 


Number 

d 

A 

Number 

d 

A_ 

1 

40 

0.  6 

11 

10 

1.0 

2 

2.  5 

0.  6 

12 

2.5 

1.0 

3 

4.5 

0.6 

13 

1.5 

1.0 

4 

1.0 

0.6 

14 

1.0 

1.0 

5 

.75 

0.6 

15 

.75 

1.0 

6 

40 

0.8 

16 

10 

1.2 

7 

2.5 

0.8 

17 

2.5 

1.2 

8 

1.5 

0.8 

18 

1.  5 

1.2 

9 

1.0 

0.  8 

19 

1.0 

1.2 

10 

.75 

0.8 

20 

.  75 

1.2 

55 


Parameter  Values: 


a  =  0.05 


Y  =  0.04 
€=  0.0005 


P  =  0.  75 

Number  of  iterations  required  =  5 

Running  time  =  3  hundredths  hour 

Eigenvalues  obtained:  44.8376 

.  1519 
.  0010 
.0000 
.  0000 
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EXAMPLE  4 


Coefficients  on  Kautz  Basis 


Signal 

Number 

al 

a2 

a3 

a4 

a5 

1 

.  0772 

.  0818 

.  0618 

.0356 

.  0192 

2 

.242  5 

.  1143 

.  0126 

-.0006 

0. 

3 

.  3494 

.  0617 

-.0096 

.  0027 

0. 

4 

.4243 

0. 

0. 

0. 

0. 

5 

.  4796 

-.0565 

.  0226 

-.0126 

0. 

6 

.  1029 

.  1091 

.  0824 

.  0475 

.  0256 

s0» 

7 

.  3233 

.  1524 

.  0168 

-.0008 

0. 

r* 

>* 

8 

.  4658 

.  0823 

-.0128 

.  0036 

0. 

e* 

9 

.  5657 

0. 

0. 

0. 

0. 

El 

10 

.  6394 

-.0753 

.  0302 

-.0168 

0. 

11 

.  1286 

.  1364 

.  1030 

.  0594 

.  0320 

12 

.  4041 

.  1905 

.  0210 

-. 0010 

0. 

13 

.  5823 

.  1029 

-.0160 

.  0045 

0. 

14 

.  7071 

0. 

0. 

0. 

0. 

15 

.  7993 

-.0942 

.  0377 

-.0210 

0. 

16 

.  1543 

.  1636 

.  1236 

.  0712 

.  0384 

17 

.  4849 

.  2286 

.  0252 

-.0012 

0. 

18 

.  6988 

.  1235 

-.0192 

.  0054 

0. 

19 

.  8485 

0. 

0. 

0. 

0. 

20 

.9592 

-. 1130 

.  0452 

-.0252 

0. 
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E.  Example  5 

Signals  used  -  Rectangular  pulses  with  both  amplitude  and 
width  varied,  but  with  r  =  2  sin  A,  and  t  varied  from  0. 1  to  2.  0 
seconds  in  equal  steps.  20  signals. 

Parameter  Values:  a  =  0.  05  y  =  0.  04 

p  =  0.  75  e  =  0.  005 

Number  of  iterations  required  =  8 
Running  time  =  4  hundredths  hour 

Eigenvalues  obtained:  41.03028 

.00001 
. 00000 
. 00000 
. 00000 
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EXAMPLE  5 


Coefficients  on  Kautz  Basis 


Signal 

Number 

al 

a2 

a3 

a4 

a5 

1 

.0067 

.  0082 

.0076 

.0056 

.  0029 

2 

.0256 

.0264 

.0171 

.0042 

-.0058 

3 

.  0553 

.0478 

.0179 

-.0098 

-.0186 

4 

.0942 

.  0673 

.0072 

-.0294 

-.0211 

5 

.1408 

.0816 

-.0141 

-.0455 

-.0836 

6 

.  1946 

.0889 

-.0426 

-.0522 

-.0159 

7 

.2549 

.  0883 

-.0746 

-.0462 

.0439 

8 

.  3201 

.0788 

-.1064 

-.0272 

.0673 

9 

.  3911 

.0608 

-.1354 

.0024 

.  0803 

10 

.4681 

.0343 

-.1593 

.0400 

.0793 

11 

.  5501 

-.0005 

-.1762 

.  0819 

.0633 

12 

.6365 

-.0434 

-.1846 

.  1247 

.  0337 

13 

.  7275 

-.  0938 

-.1837 

.  1652 

-.0069 

14 

.8243 

-.  1517 

-.1736 

.2010 

-.0551 

15 

.9309 

-.2175 

-.1543 

.2305 

-.1074 

16 

1. 045 

-.2914 

-.1252 

.2520 

-.1606 

17 

1.174 

-.  3750 

-.0865 

.2651 

-.2124 

18 

1.  322 

-.4713 

-. 0375 

.  2697 

-.2614 

19 

1.  508 

-.  5877 

.  0237 

.2668 

-.3084 

20 

1.921 

-.8068 

.  1110 

.2810 

-.3894 
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These  results  are  in  complete  agreement  with  the  "correct" 
results.  Cases  1  and  3,  the  single -parameter  cases,  produced  single 
non- zero  eigenvalues.  Cases  2  and  4,  with  two  independently  varying 
parameters,  produced  two  non- zero  eigenvalues.  Case  5,  with  two 
varying  parameters,  produced  a  single  eigenvalue.  This  is  in  agree¬ 
ment  with  the  discussion  in  Chapter  HI,  Equation  12. 


*3 

Pa 

* 

CJ 
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VII. 


DISCUSSION 


A.  Effect  of  Computer  Parameters 

The  discussion  and  outlining  of  the  computer  program  in 
Chapters  IV  and  V  avoided  comment  on  the  effects  of  the  parameters 
a,  p,  and  y  on  the  final  results.  A  few  such  comments  follow.  Of 
course,  the  values  of  these  "constants"  will  depend  to  an  extent  on  the 
input  data,  and  since  this  cannot  be  predicted,  some  adjustment  may 
be  necessary  during  the  computation.  From  Shepard's  experience 
with  the  Multidimensional  Scaling  problem,  (13)  a  few  indications  as 
to  reasonable  starting  values  for  a  and  y  were  obtained.  The  value  of 
P  had  to  be  considered  separately,  as  nothing  in  multidimensional 
scaling  corresponds  to  (3. 

1.  Alpha 

a  determines  the  rate  at  which  the  interpoint  distance 
4  variance  increases,  or,  equivalently  by  Equation  16,  the  rate  at  which 

the  dimensionality  of  the  configuration  decreases.  Large  values  of  a 
collapse  the  structure  more  rapidly,  but  unfortunately  also  do  violence 
to  the  ranking  of  distances  within  the  inequality  chains.  It  is  therefore 
prudent  to  use  small  values  for  a,  and  a  few  more  iterations.  To  this 
end,  Shepard  used  program  parameter  values  corresponding  to  values 
of  a  from  0.  01  to  0.  05.  Since  the  goals  of  the  program  described  here 
and  Shepard's  program  are  widely  different,  a  one-to-one  correspond¬ 
ence  in  parameters  does  not  exist,  and  the  values  a  =  0.  01  to  0.  05 
should  be  used  only  as  a  starting  point.  For  the  20- signal  examples 
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of  Chapter  VI,  a  =  0.  05  was  satisfactory,  although  this  may  be  too 
high  if  more  than  20  signals  are  included  in  the  collection. 

2.  Gamma 

Asa  striking  example  of  the  danger  in  blindly  using 
Shepard's  parameter  values,  consider  y,  the  parameter  determining 
the  vigor  with  which  the  program  resists  inequality  violations. 

Shepard  uses  0.2  for  this  parameter,  while  in  the  intrinsic  dimen¬ 
sionality  program,  instabilities  resulted  if  y  exceeded  0.05.  hi 
multidimensional  scaling,  only  one  chain  of  inequalities  is  used,  while 
here  the  number  varies  as  an  inverse  function  of  p.  The  result  is 
that  an  inversion  of  rank  in  any  one  chain  of  inequalities  is  likely  to 
occur  in  several  others.  This  will  multiply  the  effect  of  y  several 
times,  and  since  y  is  a  feedback  parameter,  instability  may  result  if 
y  is  not  held  to  a  low  value,  y  =  0.  04  proved  satisfactory  for  the 
experimental  work  in  Chapter  VI. 

3 .  Beta 

Now  the  second  parameter,  (3,  is  more  difficult  to  discuss. 
Perhaps  the  best  way  to  approach  it  is  to  look  at  an  analogy  v/ith  a 
parameter  in  "clustering"  problems.  (17,  18)  Young,  in  performing 
an  analysis  on  a  collection  of  signals,  considers  their  first-order 
correlations,  and  forms  a  cross-correlation  matrix,  symmetric  with 
l's  on  the  diagonal.  If  any  element  exceeds  a  threshold,  it  is  replaced 
by  unity,  if  not,  with  zero.  All  further  work  is  then  done  with  this 
matrix.  Thus  the  matrix,  and  hence  the  clusters  obtained,  depend 
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heavily  on  the  choice  of  threshold.  All  that  Young  could  suggest  was 
that  if  N  is  the  number  of  clusters  obtained  and  r|  was  the  threshold 
0  <  r)  <1,  then  if  the  function  N(r|)  showed  small  values  of  first 
derivative  for  a  wide  range  of  r|,  then  the  threshold  should  be  put  in 
this  range. 

Now,  a  similar  phenomenon  will  be  observed  as  the  value  of  (3 
is  varied.  For  very  small  p,  each  chain  of  inequalities  contains  only 
a  few  distances  to  be  ranked.  The  configuration  is  very  loosly  defined 
and  may  collapse  into  a  line  no  matter  what  the  original  configuration. 
On  the  other  hand,  large  values  of  P  place  nearly  every  possible 
distance,  d„,  in  each  chain.  From  Shepard's  results  this  means 
that  the  program  would  repeatedly  reproduce  the  original  data  with  no 
collapsing. 

This  may  be  thought  of  as  arising  from  the  discrete  nature  of 
the  input  data.  Through  any  finite  collection  of  points  in  space,  many 
curves,  surfaces,  etc.,  may  be  passed.  The  problem  is  one  of 
finding  a  "best"  number  of  dimensions  for  a  discrete  collection 
in  the  sense  that  as  the  number  of  signal  samples  increases  and  the 
collection  of  points  begins  to  approximate  a  continuum,  it  is  desired 
that  the  dimensionality  of  that  continuum  be  the  same  as  the  "best" 
estimate.  One  approach  is  to  follow  Young,  and  determine  whether 
the  value  of  k  is  invariant  under  a  wide  variation  of  p.  If  so,  the 
midpoint  of  this  range  of  P  should  be  used,  and  the  resulting  value 

of  k  taken  as  the  best  estimate  of  the  true  value.  In  this  manner,  the 
value  of  P  for  the  examples  of  Chapter  VI  was  set  at  0.  75. 
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B.  Future  Work 


Extensions  of  this  work  are  needed  in  several  areas.  The 
first  of  these  is  the  study  of  the  effect  of  noise  added  to  the  input  data. 
This  was  briefly  discussed  in  Chapter  m,  but  a  complete  investigation 
has  not  been  undertaken.  It  is  expected  that  a  threshold  effect  will 
occur  when  additive  noise  is  present  in  the  data,  and  both  theoretical 
and  experimental  work  should  be  performed  to  verify  or  disprove  this 
supposition.  Various  probability  distributions  for  the  noise  should  be 
considered,  as  well  as  the  relationship  between  signal-to-noise  ratio; 
number  of  signal  samples;  and  frequency  of  errors  in  estimating 
dimensionality. 

Another  area  of  needed  research  involves  Assumption  2  of 
Chapter  in,  Equation  11.  This  is  the  assumption  which  involves  a 
complete  independence  of  the  parameter  variations  in  the  signal 
generator.  It  has  been  stated  that  independent  variations  will  yield  the 
correct  results,  while  two  parameters  which  ?re  functionally  dependent 
will  appear  to  the  computer  program  to  be  a  single  parameter,  thus 
reducing  the  apparent  dimensionality  by  one  No  mention  has  been 
made  of  a  possible  statistical  relationship  between  two  parameters. 
This  would  be  the  case  where  the  statistics  of  the  variation  of  one 
parameter  are  modified  in  accordance  with  the  value  of  a  second 
parameter.  Errors  here  would  depend  on  the  correlation  between  the 
two  variations,  and  an  investigation  of  such  errors  would  be  of  value. 
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Another  important  area  for  more  work  is  the  question  of 
stability  of  the  computer  program.  As  previously  mentioned,  (J  is  a 
feedback  parameter,  and  wherever  feedback  is  involved,  the  stability 
problem  arises.  In  early  runs  with  the  program  described  here, 
convergence  was  not  obtained.  A  reduction  in  the  value  of  beta 
corrected  the  situation,  and  convergence  was  obtained.  A  means 
for  eliminating  this  trial-and-error  process,  or  at  least  automating 
it,  would  be  of  value.  Excessive  values  of  $  resulted  in  divergence 
of  the  configuration,  accompanied  by  a  sharp  decrease  in  the  interpoint 
distance  variance.  This  decrease  was  monitored,  and  when  it  occurred, 
the  program  was  stopped.  It  may  be  possible  to  continuously  monitor 
the  variance  changes  from  one  iteration  to  the  next,  and  modify  P  as 
the  computer  run  progresses.  This  is  only  an  ad  hoc  approach, 
however,  and  a  full  investigation  of  stability  criteria  would  result  in 
the  saving  of  considerable  computer  time  which  must  be  wasted  now 
before  a  satisfactory  value  for  P  is  established  for  each  set  of  data. 

Such  an  investigation  should  be  carried  out  in  conjunction  with  the 
additive  noise  investigation  because  the  stability  of  the  program  will 
almost  certainly  be  dependent  on  the  noise  present  in  the  input  data. 

Of  course,  the  most  important  future  work  with  this  program 
will  be  in  applying  it  to  actual  signal  analysis  problems.  A  few  such 
problems  are  suggested  in  the  conclusion. 
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C.  Conclusion 


A  definition  of  the  intrinsic  dimensionality  of  a 
signal  collection  has  been  formulated,  and  a  computer  program 
capable  of  evaluating  this  attribute  of  the  collection  has  been 
written  and  evaluated.  The  program  for  making  these 
estimations  has  several  advantages  over  the  conventional 
linear  signal  analysis  techniques. 

1.  It  uses  the  data  points  themselves,  not  an 
inferred  continuous  plot. 

2.  It  is  insensitive  to  changes  in  the  basis  on  which 
the  signals  are  projected. 

3.  It  assumes  no  prior  knowledge  of  the  signals, 
except  that  most  of  their  energy  can  be 
represented  on  the  basis  used. 
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